This report investigates the impacts of vaccinations amongst the different development regions as we dived into the difference of life expectancy and the global coverage of the vaccination programs. Overall, it aims to act as a resource for NGOs in prioritising what vaccines to distribute, and where, to best increase life expectancy. We used a dataset from OWID (Our World In Data) for vaccination coverage and global life expectancy. We analysed the trend of the vaccination programs in different regions to deduce the impact of the vaccines and potential life expectancy gains from 1980 to 2021.
We have split up our findings in 2 sections:
Impact of Vaccines: looks into how the different vaccines on different diseases perform in prevention and increasing the global life expectancy.
Modelling and Predicting future life expectancy: looks into the different trends and future projections of life expectancy based on the global vaccination coverage.
We leveraged different methods such as Pearson’s correlation coefficient and linear regression to find the impacts of different vaccinations, and ARIMA, difference of time periods, and ACF and PACF plots to analyse the trends of the time series data on how life expectancy based on the global vaccination coverage.
Life Expectancy Data
Life Expectancy dataset from Our World in Data.
Code
le_df <- read_csv("data/life-expectancy.csv")
paged_table(le_df)Cleaning life expectancy data
Code
le_clean <- janitor::clean_names(le_df)
# rename column
le_clean <- le_clean %>%
rename(period_life_expect = period_life_expectancy_at_birth_sex_all_age_0)
paged_table(le_clean)Global Vaccination Coverage Data
Code
vaccination_df <- read_csv("data/global-vaccination-coverage.csv")
vaccination_df %>% paged_table()Cleaning vaccination data
Code
vaccination_clean <- janitor::clean_names(vaccination_df) %>% paged_table()
vaccination_cleanUsing a combination of Pearson’s correlation coefficient and linear regression, the vaccines for one dose of Salk’s polio vaccine, one dose of the meningococcal vaccine, three doses of the polio vaccine and three doses of the DPT (diphtheria, pertussis, tetanus) vaccine were found to be the most significant in increasing life expectancy globally.
Code
Code
Code
Code
In this section, we will perform a time series analysis on the life expectancy data for three UN development groups: least developed countries, less developed regions excluding the least developed countries, and more developed regions. The reason for this is due to the fact that the life expectancy are very different between these regions.
Code
development_status <- stringr::str_subset(le_clean$entity, regex("devel", ignore_case = TRUE)) %>% unique()
development_status <- c("Least developed countries", "Less developed regions, excluding least developed countries", "More developed regions")
p <- le_clean %>%
filter(entity %in% development_status) %>%
ggplot(aes(x = year,
y = period_life_expect,
color = entity)) +
geom_line() +
labs(title = "Life Expectancy by Year",
x = "Year",
y = "Life Expectancy at Birth (years)")
ggplotly(p)Building ARIMA model
Now, we use auto.arima to fit best ARIMA model for each group using life expectancy data from 1950 onward, and then forecast the life expectancy for the next 5 years.
Convert dataframes to time series
Fit best ARIMA models
Code
# install.packages("forecast")
library(forecast)
le_dev_opt <- list()
for (dev in development_status) {
le_dev_opt[[dev]] <- auto.arima(le_dev_ls[[dev]])
}
le_dev_opt$`Least developed countries`
Series: le_dev_ls[[dev]]
ARIMA(0,1,2) with drift
Coefficients:
ma1 ma2 drift
-0.4223 -0.1736 0.3957
s.e. 0.1165 0.1071 0.0477
sigma^2 = 0.9743: log likelihood = -98.47
AIC=204.94 AICc=205.54 BIC=213.99
$`Less developed regions, excluding least developed countries`
Series: le_dev_ls[[dev]]
ARIMA(2,1,0) with drift
Coefficients:
ar1 ar2 drift
0.5187 -0.5537 0.4077
s.e. 0.1007 0.1005 0.0741
sigma^2 = 0.4287: log likelihood = -69.57
AIC=147.13 AICc=147.74 BIC=156.18
$`More developed regions`
Series: le_dev_ls[[dev]]
ARIMA(0,2,1)
Coefficients:
ma1
-0.7382
s.e. 0.0909
sigma^2 = 0.09376: log likelihood = -16.33
AIC=36.65 AICc=36.83 BIC=41.15
Validate ARIMA models
The auto.arima function found the best ARIMA model for each group as follows:
- Least developed countries: ARIMA(0,1,2)
- Less developed regions, excluding least developed countries: ARIMA(2,1,0)
- More developed regions: ARIMA(0,2,1)
We can also use the sarima function to check if the residuals are white noise (no tread), the ACF of residuals are not significant, p-values of Ljung-Box test are greater than 0.05.
initial value 0.069482
iter 2 value -0.018455
iter 3 value -0.031497
iter 4 value -0.032603
iter 5 value -0.034478
iter 6 value -0.034615
iter 7 value -0.034655
iter 8 value -0.034655
iter 9 value -0.034655
iter 10 value -0.034655
iter 10 value -0.034655
iter 10 value -0.034655
final value -0.034655
converged
initial value -0.032011
iter 2 value -0.032051
iter 3 value -0.032054
iter 4 value -0.032057
iter 5 value -0.032062
iter 6 value -0.032062
iter 6 value -0.032062
iter 6 value -0.032062
final value -0.032062
converged
<><><><><><><><><><><><><><>
Coefficients:
Estimate SE t.value p.value
ma1 -0.4223 0.1165 -3.6248 0.0006
ma2 -0.1736 0.1071 -1.6204 0.1098
constant 0.3957 0.0477 8.3012 0.0000
sigma^2 estimated as 0.9330945 on 68 degrees of freedom
AIC = 2.886429 AICc = 2.891474 BIC = 3.013904
Code
sarima(le_dev_ls$`Less developed regions, excluding least developed countries`, 2, 1, 0)initial value -0.199607
iter 2 value -0.401779
iter 3 value -0.426091
iter 4 value -0.437383
iter 5 value -0.437806
iter 6 value -0.438451
iter 7 value -0.438459
iter 8 value -0.438462
iter 9 value -0.438462
iter 10 value -0.438462
iter 10 value -0.438462
iter 10 value -0.438462
final value -0.438462
converged
initial value -0.439003
iter 2 value -0.439076
iter 3 value -0.439123
iter 4 value -0.439125
iter 5 value -0.439126
iter 5 value -0.439126
iter 5 value -0.439126
final value -0.439126
converged
<><><><><><><><><><><><><><>
Coefficients:
Estimate SE t.value p.value
ar1 0.5187 0.1007 5.1526 0
ar2 -0.5537 0.1005 -5.5117 0
constant 0.4077 0.0741 5.5020 0
sigma^2 estimated as 0.4105617 on 68 degrees of freedom
AIC = 2.072301 AICc = 2.077346 BIC = 2.199776
Code
sarima(le_dev_ls$`More developed regions`, 0, 2, 1)initial value -0.939558
iter 2 value -1.154217
iter 3 value -1.164176
iter 4 value -1.170757
iter 5 value -1.170866
iter 6 value -1.170875
iter 6 value -1.170875
iter 6 value -1.170875
final value -1.170875
converged
initial value -1.185666
iter 2 value -1.185691
iter 3 value -1.185715
iter 3 value -1.185715
iter 3 value -1.185715
final value -1.185715
converged
<><><><><><><><><><><><><><>
Coefficients:
Estimate SE t.value p.value
ma1 -0.7382 0.0909 -8.1186 0
sigma^2 estimated as 0.09230321 on 69 degrees of freedom
AIC = 0.5235895 AICc = 0.5244299 BIC = 0.5878322
Forecast the life expectancy for the next 5 years
According to the ARIMA model, the life expectancy of least developed countries and less developed regions are expected to increase in the next 5 years whereas that of more developed regions is expected to decrease. However, the downside of this model is that it assume life expectancy of this year is only dependent on life expectancy in previous years, and does not take into account the external effect such as COVID pandemic since 2019.
Code
p1 <- le_dev_opt$`Least developed countries` %>%
forecast(h = 5) %>%
autoplot() +
ggtitle("5 Years Forecast for Least Developed Countries") +
xlab("Year") +
ylab("Life Expectancy")
p2 <- le_dev_opt$`Less developed regions, excluding least developed countries` %>%
forecast(h = 5) %>%
autoplot() +
ggtitle("5 Years Forecast for Less Developed Regions (Excluding Least Developed Countries)") +
xlab("Year") +
ylab("Life Expectancy")
p3 <- le_dev_opt$`More developed regions` %>%
forecast(h = 5) %>%
autoplot() +
ggtitle("5 Years Forecast for More Developed Regions") +
xlab("Year") +
ylab("Life Expectancy")
p1Code
p2Code
p3References
Dattani, S., Rodés-Guirao, L., Ritchie, H., Ortiz-Ospina, E., & Roser, M. (2023). Life Expectancy. Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/life-expectancy)
Immunization coverage. (n.d.). Retrieved 22 July 2024, from (https://www.who.int/news-room/fact-sheets/detail/immunization-coverage)
WHO, & UNICEF. (n.d.). Vaccination Coverage. Prepared by Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/grapher/global-vaccination-coverage)
Roser, M. (2023). The rise of maximum life expectancy, Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/the-rise-of-maximum-life-expectancy)